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Abstract 

We show that Markov couplings can be used to improve the accuracy of Markov chain 
Monte Carlo calculations in some situations where the steady-state probability distribution is 
not explicitly known. The technique generalizes the notion of control variates from classical 
Monte Carlo integration. We illustrate it using two models of nonequilibrium transport. 

Introduction 

Markov chain Monte Carlo (MCMC) algorithms generate samples from a probability distribution 
by simulating a Markov chain that leaves the distribution invariant. One estimates expected values 
by time averaging over long simulations lfT0l[T8l . For high-accuracy Monte Carlo computations, 
variance reduction methods are crucial. Unfortunately, some variance reduction methods are hard 
to apply in MCMC, particularly when there is no explicit expression for the steady-state probability 
distribution of the Markov chain. 

In this paper, we demonstrate a technique for MCMC variance reduction which can improve 
accuracy by factors of up to 2 or more in certain situations where an approximate steady-state 
distribution is known. The technique, which we call coupling control variates, builds on earlier 
work using Markov couplings in MCMC [fT3l [P71 l20l . Specifically, we assume that we can ob- 
tain an explicit approximation of the steady-state distribution, and that the expected values of this 
approximate distribution are known. The basic idea is to find a second Markov process which (i) 
leaves the approximate distribution invariant, and (ii) "shadows" {i.e., closely follows) the original 
Markov process. The expectations of the approximate distribution then provide an initial "guess," 
which we correct by simulating the two "coupled" processes to estimate the difference (in expected 
values) between the true steady-state distribution and our approximate distribution. 

We apply the technique to certain lattice models from statistical physics, in which the steady- 
state probability distribution is approximately a product of local distributions when the system is 
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out of equilibrium^ These systems are of interest in the theory of transport processes such as 
heat conduction. In this paper, we consider models consisting of a linear chain of lattice sites 
coupled to "heat baths" at each end; each bath is characterized by thermodynamic parameter(s) 
like temperature, chemical potential, etc. The steady-state probability distribution is a Gibbs- 
Boltzmann distribution if the bath parameters are equal. This is not the case for unequal heat baths. 
However, a large lattice out of equilibrium may still have a steady-state distribution that is locally 
in equilibrium, e.g., for heat flow, the statistics at a given location is approximately governed by a 
Gibbs-Boltzmann distribution with a local temperature (see Sect.[2]for details). We will show how 
such "local equilibrium" distributions can be used to achieve variance reduction. 

We note that similar ideas have appeared in previous studies. In addition to the works cited 
earlier, another example is "shadow hybrid Monte Carlo" in molecular dynamics [8J. See also 
for a version of this idea applied to Markov sensitivity analysis. Finally, we point out that Markov 
couplings have been used in a quite different way to perform exact Monte Carlo sampling |[T6ll . 



1 Coupling control variates 
1.1 General framework 

We begin by recalling the technique of control variates in classical Monte Carlo (MC) integra- 
tion [7]: suppose X is a random variable with probability density px, and we want to estimate its 
expected value X = E[X] = x ■ px(x) dx. The standard Monte Carlo estimator of X is 

1 - 

x n = — y x k , (i) 

k=l 

where Xi, X 2 , ■ ■ ■ , are independent samples from the distribution px- The variance of the estima- 
tor is Var[X n ] = Var[X]/n. It is not generally possible to improve the c/n scaling; more accurate 
estimates are usually obtained by reducing the variance of the estimand. 

A control variate for X is a random variable Y whose expected value Y — E[Y] is known and 
is correlated with X. One can estimate X using the control variate estimator 

1 n 

X C V,a;n = ~ ^ [X k + « • (Y - Y k )] . (2) 
k=l 

where Y k ), k — 1, 2, • • • , are samples from the joint distribution of X and Y, and a is an 
adjustable parameter. Optimizing Vax[Xcv,a;n] over a gives an optimal control variate estimator 
of X with variance 

i Var[X] • (1 - ply) , 

where pxy is the correlation coefficient Cov(X, Y)/(Var[X] ■ VarfF]) 1 / 2 . In the special case 
a — 1, Eq. © simply corrects the initial "guess" Y with an estimate of X — Y. 



'Here, "equilibrium" is used in the sense of statistical physics, i.e., "thermal equilibrium." This means that the 
Markov chain satisfies detailed balance [ 10 1, and the steady-state probability distribution is a Gibbs-Boltzmann distri- 
bution ^e~P H . Steady-state distributions of Markov chains that are not in equilibrium are known as "nonequilibrium 
steady states." We focus on the latter here. 
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Consider now Markov chain Monte Carlo, where the samples are not independent, but are suc- 
cessive states of a Markov process. For concreteness, let X t be a time-homogeneous continuous- 
time Markov process with finite state spaced fi. The dynamics of X t are completely specified by 
the transition rates R(x'\x), which tell us the rate at which X t jumps from state x to state x', i.e., 

Prob^X t+ At = x' X t = = R{x'\x) ■ At + 0(At 2 ) . We assume that the process X t has a 
unique steady-state probability distribution P, so that J2 x , R(x\x') P(x') = J2 X > R(^'\x) P(x). 

Given an observable : Q — > E, one can obtain a direct estimate of E x [(j)] = J2 x en ~P( X ) 
by simulating the process X t for t G [0, T] and applying the simple estimator 



T 



<f>(X t ) dt . (3) 



This converges almost surely to Ex[0] as T — > oo. The variance of 0t is given by the Kubo 
variance formula HI 

\lar\M ■ t „ 

+ 0(1/T 2 ) , (4) 



T 

where Var[0] is the variance of the observable <p with respect to P. The constant r is the integrated 
autocorrelation time 

poo 

r= / p(t)dt, 



where p{t) = C(t)/C(0) is the time-autocorrelation function of 4>{X t ), and 

C(t)= lim Cov(0(X t+to ),0(X to )) . 

to— *oo V / 

Note that r depends on both the observable <\> and the Markov process X t . 

As in the case of MC integration, it is not generally possible to improve the c/T scaling in 
Eq. ©. Variance reduction schemes typically aim to reduce either the autocorrelation time r or 
the variance C(0) of the estimand. 

To extend the notion of control variates to this setting, one looks for a second Markov process 
Y t which is correlated to the process of interest X t [fT5l[T7ll20ll . The notion of correlated processes 
can be made precise by Markov couplings [fT4l : if X t and Y t are Markov processes with respective 
transition rates Rx and Ry, a Markov coupling of X t and Y t is a specification of joint transition 
rates Rxy((x', y')\(x, y)) for transitions from (X t , Y t ) = (x, y) to (X t , Y t ) = (x', y'), so that 

Y. y > RxY((x',y')\(x,y)) = R x (x'\x) for all y, x, x' , and 

Y, x i R xY((x',y')\(x,y)) = Ry(y'\y) for all x, y, y' . 

In other words, a Markov coupling of X t and Y t is a Markov process on the product space Vt x Vt 
that gives a realization of X t when projected onto the first component, and likewise gives Y t when 
projected onto the second. 

Suppose a process Y t can be found such that the expectation Ey [cp] with respect to the stationary 
distribution Q of Y t can be computed easily. We define the coupling control variate estimator by 



4>{X t ) + a- (E y [0] - <j){Y t )) 



dt . (6) 



z Extending our ideas to more general settings is straightforward. See for instance Sect. 12.21 
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The process Y t is the coupling control variate. It is possible to estimate a nearly optimal a using 
the Kubo variance formula ©, but for simplicity we will always set a = 1 in this paperj^ In order 
for the coupling control variate to be effective with this choice of a, <f>(Y t ) — <j>(X t ) should have 
small variance, i.e., the states X t and Y t should remain as close to each other as possible. 

1.2 The coupling control variate algorithm 

Now, suppose we are interested in computing Ex[</>] for a Markov process X t with transition 
rates Rx(x'\x). Suppose further that the steady-state distribution P is not known, but that an 
approximate steady-state distribution Q is available. Our aim is to construct a coupled process 
(X t , Y t ) with transition rates R X y((x', y') | (x, y)) so that 

(i) The marginal X t has transition rates Rx, and therefore steady- state distribution P. 

(ii) The marginal Y t has steady-state distribution Q. 

(iii) X t and Y t remain as close as possible given constraints (i) and (ii). 

We show here how the coupling R X y((%', y')\(x, y)) can be constructed from a coupling R X x of 
two realizations of Rx processes. Such couplings are available in many situations; see Sect. 11.31 
The basic idea is to apply the Metropolis-Hastings algorithm using the second component of Rxx 
as proposal and the distribution Q as the target distribution. The result is a process Y t satisfying 
the detailed balance condition with respect to Q: 

Q(y') ■ RyW) = Q(v) ■ Ry(y'\y) • (7) 

Thus, the stationary distribution of Y t is Q. This is a straightforward generalization of the detailed 
balance condition for discrete time Markov chains; see, e.g., |[T0l[T8l . 

More precisely, recall that one way to simulate continuous-time finite-state Markov processes 
is as follows (sometimes known as the Gillespie algorithm [5]): let R(x) = ^2 X ,^- X R(x'\x) be 
the total exit rate from a state x E Vt. Let T n be the times at which the system jumps to the next 
state, and let X(ri) = X Tn+ be the state of the system after each jump. If X(n) = x, we set an 
exponential clock of mean 1 / R(x) . When the clock rings, we choose a new state x' with probability 
P(x'\x) = R(x'\x)/R(x) and set X(n + 1) = x'. Note that X t = X(n) for T n < t < T n+1 . 

The following simple algorithm generates one step of a coupled process (Xt, Y t ) satisfying 
conditions (i-iii) above: 

Algorithm. Let State = (x, y) be the current state of the joint process (X t ,Y t ). With rate 
Rxx(x',y'\x,y), set Proposal = (x',y'). Compute 



Q(x>) 


■ Rx(x\x') 


Q(x) 


■ R x (x'\x) 



With probability min(Z, 1), we accept Proposal and set NewState to (x 1 , y'). 

With probability 1 — min(Z, 1), we reject Proposal and set NewState to (x', y). 

3 For the models studied in this paper, it is expected that the optimal a will be w 1. In more general situations, it is 
important (and not difficult) to estimate an optimal a. 
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It is easy to check that the coupled process (X t , Y t ) generated by this algorithm satisfies Eq. ©. 
Thus, the estimator ©, when applied to [X t , Y t ), is always consistent in that (f) C ou P ie;T — ► Ey [0] 
as T — > oo. Note, however, that whether the variance of the coupling control variate estimator 
is lower than that of the simple estimator © depends on the coupling Rxx and the approximate 
distribution Q. 

Remark. We note that when computing the expectation of static observables using this algorithm 
for continuous-time Markov chains, one can reduce variance a little bit more by replacing the time 
intervals T n+ i — T n by the mean l/R(X(n)). 

1.3 Some practical considerations 

Approximate stationary distribution. The choice of Q is problem-dependent. In the nonequilib- 
rium models discussed in Sect.O as in many other physical situations, perturbative analysis of the 
relevant master equation often gives good candidates for Q. Note that because the coupling esti- 
mator is always consistent, it is not necessary to know a priori how good an approximation Q is 
to the true stationary distribution, so that one can take advantage of uncontrolled approximations. 
However, the degree of variance reduction depends on the distribution Q and the coupling Rxx- 
To choose the distribution Q, one should follow these criteria: 

(i) The expected value Eq [0] should be easy to compute. This is necessary in order to apply the 
coupling control variate estimator ©. 

(ii) The distribution Q should be "close enough" to the true stationary distribution P x that the 
rejection rate is low. We may then expect Y t to remain close to X t , so that the coupling 
control variate estimator may have low variance. 

Constructing couplings. How do we obtain a coupling Rxx to start with? As mentioned earlier, 
constructing Markov couplings is not always straightforward. However, couplings have long been 
used as a theoretical tool for studying the ergodic properties of Markov processes, and "good" 
couplings have been found for a broad range of stochastic models lfT4ll . In many (though not 
all) cases, it suffices to simply use the same sequence of random numbers to couple two Markov 
processes. Examples include stochastic differential equations that are contractive in the sense that 
their largest Lyapunov exponent is negative ifTTI and the models in Sect. [2] 

Factors affecting scaling of errors. The variance of the coupling control variate estimate is 

Var^,) = V "'«*> ■ + 0(1/7-) , (9) 

where r coup i e is here the integrated autocorrelation time of (f>(X t ) — 4>(Y t ), and Var[0(X) — 4>(Y)} 
is the variance of the random variable 4>{X) — 4>(Y) with respect to the stationary distribution of 
the coupled process on the product space x fi. Note that if the coupling is effective in keeping 
(f>(X t ) — 4>{Y t ) small, then the variance in Eq. © will be small. However, when a proposed move is 
rejected by our algorithm, the process Y t "stands still." The process Y t (and hence <j>(X t ) — 4>(Y t )) 
may therefore have a slower correlation time than X t . That is, the amount by which the variance 
of the estimator is reduced may reflect competition between lower variance and larger correlation 
time. 
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Figure 1 : The symmetric simple exclusion process. 



Overhead and running time. Another practical consideration is the complexity of Q and the cou- 
pling Rxx'- a "good" coupling that is computationally expensive to implement may not, in the 
end, be worth the effort. Couplings that are easy to implement, for example simply using the same 
sequence of random numbers, have a distinct advantage in this regard. 



2 Nonequilibrium transport processes 
2.1 Symmetric simple exclusion process 

The first model we consider is the symmetric simple exclusion process (SSEP) in one space di- 
mension [121 . This is a stochastic lattice gas model of a linear medium with a reservoir placed at 
each end. The two reservoirs are typically maintained at different densities, so that there is a net 
flow of particles through the medium. More precisely, the domain is a linear chain of N sites, with 
each site holding at most one particle at any given time. Thus, the state of the system a E £1 can 
be thought of as a binary string of length N, with = 2^. The dynamics are as follows: each 
particle carries an exponential clock of rate 1. When the clock rings, the particle will try to jump 
to a neighboring site, choosing left and right with equal probability; the particle does not move if 
the target site is occupied. The left reservoir will place a particle in site 1, when it is unoccupied, 
at rate a; and remove a particle from site 1, when it is occupied, at rate (3. The right reservoir acts 
on site N in an analogous manner, at rates 5 and 7, respectively. See Fig. [Q Note that the total 
particle number is conserved, except when the reservoirs inject or remove a particle. 

We begin by summarizing some known results on the SSEP; see fl4l [T2J for details. It is easy 
to show that the SSEP has a unique stationary distribution P N . Much is known about P N . In 
particular, various probabilities can be calculated exactly using the "matrix method." The SSEP 
thus provides a convenient test case for illustrating coupling control variates in nonequilibrium 
transport models. A central motivation for studying models like the SSEP is to understand how 
macroscopic transport processes arise from microscopic dynamics. One quantity of interest is the 
macroscopic density profile p : (0, 1) — > R, defined by 

p(x) = Urn E N [a [xN] ], x G (0, 1) , (10) 

N^oo 

where Ejv[-] denotes expectation with respect to P^. Another quantity of great interest is the 
correlation between distant sites (see below). 

Specifically, let p L = a/(a + (3) and pr = 5/(5 + 7). These quantities can be thought of as 
the particle densities of the reservoirs. When p L = p R = p , the SSEP satisfies detailed balance, 
and it is easy to check that the equilibrium distribution is 

N 

Piv(<r) = n^*) > (H) 

1=1 
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where = p an d p(0) = 1 — p - The occupation numbers become IID Bernoulli random 
variables. Note that this means p(x) = p . 
If p L 7^ p R , it can be shown that 

p(x) = p L - (1 - x) + p R ■ x . (12) 

The non-constant profile reflects the presence of a nonzero current. The stationary distribution Pn 
is no longer a product: the covariance Cov^ (crj, ctj) is nonzero for i ^ j. The dynamics no longer 
satisfies detailed balance. 

The large- iV scaling of spatial correlations is also known. Fix x, y, so that < x < y < 1. 
Then 01 

lim N ■ Cov N (a[ xN ],a [yN] ) =-(p R - p L ) 2 ■ x (1 - y) . (13) 

Thus, for N 3> 1 and i, j not too near the end points of (0, 1), we have Covjv (e>i, <Jj) = 0(1 /N). 
We note that this 1 /N scaling is not unique to the SSEP — it has been observed in other settings 
as well J2l |H [T31 [19l|. The correlation is thus quite weak for large N. This means that comput- 
ing correlations in nonequilibrium transport models like the SSEP presents numerical difficulties: 
when the covariances are 0(1/N) and the occupation numbers cr, themselves remain 0(1), a di- 
rect computation entails subtracting two quantities of like magnitude to estimate a much smaller 
number. 

To apply coupling control variates to this problem, we need an approximate stationary distri- 
bution Q and a coupling. For nonequilibrium transport models like the SSEP, a choice of Q is 
suggested by the notion of local thermal equilibrium (LTE): in physical terms, even though the 
system cannot be in thermal equilibrium because the two ends are in contact with reservoirs at 
different densities, for large N it is generally expected that small parts of the medium will reach 
approximate local thermal equilibrium (3). For the SSEP, it has been shown that LTE holds in 
the following sense: fix x E (0, 1) and a positive integer k. Then, as N — > oo with x and k 
fixed, the occupation numbers cr\ x M, cr\ x m+ij ' ' ' > a \xN}+k converge in distribution to independent, 
identically-distributed Bernoulli random variables with Prob(cr = 1) = p(x), where p is the linear 
profile given in Eq. (fTIl) . Heuristically, this tells us that even though the system cannot attain a 
global thermal equilibrium when pi ^ p R , it does approach local equilibrium when N ^> 1. It 
also suggests that we use as our approximate stationary distribution 

N 

<M*)=n*fa)' (i4) 

i=l 

where ^(1) = p(xi), ^(0) = 1 — p(x-i), and Xi = The distribution Q N can be thought of 

as a local equilibrium distribution, in which the sites are occupied independently with probability 
p[xi). The LTE property suggests that Q N may become a better approximation of as N — >• oo, 
at least locally. 

The other ingredient we need is Rxx, a coupling of the SSEP to itself, so that we can use the 
algorithm in Sect. 1 1.21 to construct a coupling control variate. This is straightforward [TT2l : given 
two copies of SSEP, we simply carry out the same moves in both copies whenever possible, and 
move independently when not. More precisely, let Moves (a) denote the set of all available moves 
for a, where a move means a particle jumping from site i to site j (for all i, j with \i — j \ = 1) or 
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changing the occupation number of site 1 or site N. To each move in Mov es(cr) U Moves(a), we 
attach an independent exponential clock of the appropriate rate — 1/2 for jumps, a for injection by 
the left reservoir, etc. When a clock goes off, check if the corresponding move is in Moves (a) R 
Moves(a), i.e., whether a and a can make the same move. If so, update both a and a accordingly. 
If the move is in Moves (a) \ Moves(a), i.e., if only a can make the move, then update only a. 
Similarly for moves in Moves(a) \ Moves(a). This algorithm couples two copies of the SSEP 
process. 

We can now apply the Metropolis-Hastings construction from Sect. |1.2[ This yields a coupling 
control variate for the SSEP, with Metropolis ratios Z given by the following table: 

Transition from site i to j, \i — j\ — 1 Za. ■ = — ■ 

Injection (removal) by left reservoir Z Um = ■ £ (Z Lm = l/Z Lm ) 
Injection (removal) by right reservoir Z Km = ■ j (Z Rout = l/Z Rm ) 

Note that the Z ratios involve only local quantities because the distribution Qn has product form. 
Note also that the rejection probabilities are quite small when N ^> 1: since pi — pj = 0(1/N), 
the Metropolis-Hastings ratios Z above are 1 + 0(1/N) (as long as < Pl, Pr < 1)- Thus, the 
Metropolis-Hastings algorithm rejects fewer and fewer samples as iV — > oo. 

Numerical results. To assess the effectiveness of the coupling control variate, we use a metric we 
call the error ratio 

eM = (Y^A) 1 ' 2 (15) 

\ Var^ [<t>\ J 

for a given observable <p. The error ratio measures the amount by which the estimator (f) coup i e 
improves the accuracy of the estimate. 

Fig. [2ta) shows the error ratio efcr^jv]] for the occupation numbers at a few selected locations 
along the chain, specifically x 6 {0.3,0.5,0.8}. The error ratio decreases with increasing N. 
The improvement with iV is expected, since the local equilibrium distribution Qn is expected to 
be a better approximation of the true stationary distribution P N when iV is big. Indeed, our data 
show that the rejection rate of the Metropolis-Hastings step decreases as N increases. In Fig. [2fb), 
the error ratio for the products u\ x ma\ y m are shown for pairs (x,y) at distances ranging from 
"infinitesimal" (nearest neighbors) to \x — y\ = 0.7. These results show that coupling control 
variates can effectively improve the accuracy of calculations involving hard-to-estimate quantities 
like spatial correlations. 

Fig.|3]shows the error ratios for the occupation numbers (T[ x n] as as functions of spatial location 
x E (0, 1), for iV G {50, 100, 500}. As can be seen, the error ratio has a strong dependence on 
spatial location, nearly vanishing at the boundaries but quickly attaining a near-linear profile in the 
interior of the domain. The figure show that some degrees of freedom couple better than others, and 
that sites in a "boundary layer" near the reservoirs couple especially well. An explanation is that 
in order for the two processes to couple at, say, site 1, we need only that their occupation numbers 
at site 1 agree, whereas for coupled moves to occur in in the interior of the system requires that the 
occupation numbers of two neighboring sites agree. In any case, despite this dependence on spatial 
location, overall the coupling control variate has improved the accuracy of MCMC estimates by a 
factor of > 40% for N w 500. 



DRAFT -- DRAFT -- DRAFT -- DRAFT -- DRAFT -- 



1.2 



o 






0.75 - 


S-H 








o 




fcl 


0.5- 








0.25- 




()L 




1.25 



500 




500 



N 



N 



(a) 



(b) 



Figure 2: The SSEP error ratio vs. system size N. In (a), we show the error ratio e/v (see text) for the 
estimated density at x = 0.3 (solid discs), x = 0.5 (open circles), and x = 0.8 (squares). In (b), we 
show the error ratio for the near-neighbor product ar x m • &[xN]+i with x = 0.5 (solid discs), and for the 
products ct^jv] • a [yN] with { x i 2/) = (0-4, 0.7) (open circles) and (x, y) = (0.2, 0.9) (squares). The errors 
are estimated using batched means estimators lfT8ll . The parameters are a = 2, j3 = 0.1, 5 = 0.3, and 7 = 1. 

We note that the coupling control variate estimator can be implemented with overhead of less 
than twice the running time of a single SSEP simulation. If we run two independent copies of 
SSEP simulations and average the results, the standard error of the resulting estimate will decrease 
by a factor of l/y/2 » 0.7, i.e., a 30% gain. We see that for single-site density estimates, the 
coupling control variate offers a noticeable improvement over simply running more copies of the 
simulation, and performs significantly better for two-site estimates. 

The Kubo formula © tells us that when the simulation time T is sufficiently large, the error 
ratio (fT5l) can be written as a product of two factors: 



Var[0(a) 



Var[0(<7)] J 



1/2 



I Tcouple 
V T 



1/2 



(16) 



= e mr; 7v[0] ■ e T . N [4>} . 

The reasoning in Sect. 11.21 suggests that the error ratio e N reflects both the gain in the first factor 
e var ;N by reducing variance, and possible loss due to an increase in the second factor e T] N, by 
increasing correlation times. To assess the situation, we have plotted e var ^[(j)\, with = <j x n for 
a few locations x, in Fig. ID^a). This curve should coincide with the plot of cn in Fig. |2a) ?/the 
correlation time of the SSEP were equal to that of the coupling control variate. Instead, we find 
that e var; x < c TV- Fig.Ub) shows the ratio of integrated autocorrelation times. As can be seen, the 
coupling control variate may increase correlation times at the same time that it reduces variance. 
Here, the reduced variance wins over the increased correlation time. 



2.2 KMP model 

The second model we consider is the Kipnis-Marchioro-Presutti (KMP) model AH. This is a 
stochastic idealization of a chain of iV coupled harmonic oscillators placed at the vertices of a 
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Figure 3: The SSEP error ratio for the occupation number a\ x m as a function of location x. The curves are, 
from top to bottom, N = 50, 100, 500. The parameters are a = 2, j3 = 0.1, S = 0.3, and j = 1. 
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Figure 4: The variance and correlation time components of the error ratio for the SSEP. In (a), we show 
the factor e var; N, as defined in Eq. (fTBT ), for (ft = crujv] with x = 0.3 (solid discs), x = 0.5 (open circles), 
and x = 0.8 (squares). In (b), we show the corresponding ratios of correlation times. Correlation times 
are computed by checking numerically that Kubo scaling (O is in effect (batched means estimates of the 
estimator error for integration times T G [10 5 , 10 7 ] show that the mean squared error ~ T -1 / 2 ). Then, the 
correlation time is "reverse-engineered" using the Kubo formula, and spot-checked by direct computation 
of time correlation functions. Variances are computed by time averaging for 10 8 time units. The parameters 
are a = 2, (3 = 0.1, 6 = 0.3, and 7 = 1. 
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regular lattice. We think of the ith oscillator as having energy e i9 given by a nonnegative real 
number, so that the state space is = [0, oo)^. Note that unlike the SSEP, is uncountable. At 
sites and iV + 1, we place "heat baths" with temperature T L and T R , respectively. There are 
thus N + 1 bonds in the system, linking site i with i ± 1 for i — 1, • • - , N. Associated with each 
bond is an independent exponential clock of rate 1. If the clock for the bond (i, i + 1) rings and 
1 < i < N — 1, then the energies of oscillators i and i + 1 are pooled together and redistributed 
randomly, i.e., ef — U • (e~ + ej +1 ) and ef +1 = (1 — U) ■ (e^ + e~ +1 ), where U is a uniform ran- 
dom variable on [0, 1] independent of everything else, e + denotes energy after the redistribution, 
and e~ denotes the prior energy. If the clock for the bond i = rings, S\ jumps to a new energy 
level u with probability density /3 L e~^<", (3 L = 1 /T L . Similarly for the bond (N, N + 1), but with 
parameter [3 R = 1/T R . Notice that the dynamics conserves energy except at sites 1 and N, just as 
the interior dynamics of the SSEP conserves particle number. 

The KMP process provide a simple microscopic model of heat conduction. When T L = T R = 
T , the system attains thermal equilibrium: the dynamics satisfies detailed balance, the stationary 
distribution Pjq is a product of Gibbs distributions with densities /3oe _ft)£ {(3q = 1/T ), and the 
temperature at all sites is equal to T . When Tl ^ T R , we have a linear temperature profile 

T(x) = T L ■ (1 - x) + T R ■ x , x G (0, 1) , (17) 

where T(x) = limA^oo Ejv[£fan]]- This non-constant profile reflects the flow of a nonzero energy 
current through the system. The spatial correlations have a similar scaling as the SSEP [2J: the 
limit 

c(x,y)= lim TV Covjv (s[ x n] ,£[yN]) 

N — >oo 

exists, and 

c(x, y) oc (Tr - T L f ■ x(l - y) , < x < y < 1 . 

Like the SSEP, Covn(e[ x n], %jv]) = 0(1/N). Thus, one encounters similar difficulties when 
estimating spatial correlations numerically. 

It has been shown that the KMP model attains LTE as N — » oo, i.e. k-site marginals converge 
to a product of Gibbs distributions, with a local temperature T(x) given by the linear profile above. 
This suggests that we use 

JV 

Q N {e) = \[(3 t e-^ , (18) 

8=1 

where /3j = 1/T(xi), as approximate stationary distribution. A simple coupling of the KMP 
process to itself is also available: given two copies of the KMP process, we make the same bonds 
"ring" at the same time. For interior bonds, we use the same uniform random numbers U to split 
energy in both copies; for heat baths, we set the boundary sites to the same new energy. The 
coupling is illustrated in Fig.[5J it entails having the e process use the same "randomness" as the e 
process to redistribute energy between nearby sites. 

One difference from the SSEP is that the KMP model has an uncountable state space, so the al- 
gorithm described in Sect. [TJrequires slight modification. This is straightforward for Markov jump 
processes with transition densities: one can simply replace the ratio of transition rate coefficients 
Rx in Eq. ([8]) with the ratio of the corresponding densities. The KMP process does not only have 
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Figure 5: Illustration of the KMP coupling. Because the interaction conserves energy, the point (Xi,Xi + i] 
is constrained to lie on the line Xj + Xi+i = const both before and after the interaction. 
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Figure 6: The KMP error ratio as a function of system size N. In (a), we show the error ratio for the 
estimated mean energies at x = 0.3 (solid discs), x = 0.5 (open circles), and x = 0.8 (squares) as functions 
of N. In (b), we show the error ratio for the near-neighbor product £\ x m • £[ x n]+i with x = 0.5 (solid discs), 
and for the products Ewm ■ £\ y m with (x, y) = (0.4, 0.7) (open circles) and (x, y) = (0.2, 0.9) (squares). 
The errors are estimated using batched means estimator. The parameters are Tl = 10 and Tr = 100. 



an uncountable state space, though — it also has singular transition rate measures (this is a conse- 
quence of energy conservation). Nonetheless, it can be checked that the ratios are well-defined in 
this case, and yield the following Metropolis ratios: 



Interaction resulting in (e i( Sj) t— > e'j), \i — j\ = 1 = exp ^[/3j£j + PjEj] — [^e- + PjSj] 

Left heat bath setting e\ i— > £j = exp — Pi) ■ (e[ — ei)) 
Right heat bath setting e n i-> ej, Z fi = exp ((^ - • (ej, - £„)) 

Applying the algorithm in Sect. 11.21 with these ratios yields a coupling control variate which pre- 
serves the local equilibrium distribution Q N . 

Numerical results. Fig. [6^a) shows the error ratios for various sites in the KMP model. As is the 
case for the SSEP, the coupling control variate significantly reduces the variance of the estimator. In 
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Figure 7: The variance and correlation time components of the error ratio for the KMP process. In (a), we 
show the factor e var -N, as defined in Eq. (fT6l ). for <fi = £\ x m for x = 0.3 (solid discs), x = 0.5 (open 
circles), and x = 0.8 (squares). In (b), we show the corresponding "reverse-engineered" ratio of correlation 
times. The parameters are Tl = 10 and Tr = 100. 

contrast to the SSEP, the amount by which the error is reduced depends more strongly on location, 
ranging from 20 to 60%. Fig. |7£b) shows the error ratios for the products £\ x m ■ £\ y m f° r pairs 
(x, y) located at various distances. These ratios are much more consistent and tend to « 40% for 
the range of N tested. 

Fig. Ha) shows the corresponding factor e var -N- As in the case of the SSEP, e var -,N is strictly 
smaller than the error ratio e^; at the same time, the ratio e T -N of correlation times increase; see 
Fig. Hb). Thus, Metropolis rejections can have a dramatic effect on the correlation time of the 
coupling control variate. Despite that, the overall performance of the coupling control variate 
estimator is quite good: even at its worst, the accuracy has been improved by 40%. 



Conclusion 

We have shown that Markov couplings, when available, can be used effectively to improve the 
accuracy of Markov chain Monte Carlo calculations. This method useful in situations where the 
stationary distribution is not known explicitly, as in the case of nonequilibrium transport models. 
As shown by the examples considered in this paper, good candidates for approximate stationary 
distribution can be found based on physical reasoning, and when an effective coupling is available 
for the Markov process at hand, one can construct an effective coupling control variate. 

The numerical results suggest various directions for improvement. In particular, the observa- 
tion that coupling control variate has larger correlation times than the original process suggests that 
one try to "trade" variance for correlation time. However, simple ideas like resampling the energy 
of random sites at random times, as in heat bath / partial resampling, may very well increase vari- 
ance more than it decreases correlation time, resulting in a net gain of error. A related issue is the 
dependence of the estimator error ratio on observables: in many applications, it is desirable to be 
able to optimize the error ratio only for observables of interest. (One does not expect to be able to 
have small error ratios for all observables unless the approximate and true stationary distributions 
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are close in the total variation norm.) 

Finally, we mention that it might be possible to use related coupling methods for sensitivity 
analysis. If the Markov process depends on parameters 9, then the observable <p in Eq. © becomes 
<pe and the sensitivities are derivatives of <\>q with respect to 9. Sensitivities are used, for example, 
in numerical computation of optimal stochastic controls in situations where the curse of dimen- 
sionality makes dynamic programming impractical. If there is a known formula for the stationary 
distribution P e , two common methods for evaluating sensitivities, the common random variables 
(or same paths) method^ and the likelihood ratio (or score function) methods. Glynn H and others 
have generalizations of the likelihood ratio method to situations where T is known but not P. It 
also might be helpful to have such a generalization of the same paths method. 
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